- Brown, James Dean. (2014). The Future of World Englishes in Language Testing. Language Assessment Quarterly, 11, 5-26.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This article begins by defining World Englishes (WEs) and the related paradigm of inner-, outer-, and expanding-circle English(es). The discussion then turns to the central concerns of the WEs and language testing (LT) communities with regard to how English tests can best be constructed to include various WEs by discussing (a) what language testers need to understand about WEs (i.e., that the English native speaker norm is no longer sacred and that three different perspectives on English diversity may prove useful in LT) and (b) what language testers need to convey to WEs advocates (i.e., that LT is already contributing to the understanding of linguistic variation, that LT is not ignoring WEs issues, and that LT is much more than the standardized international tests). The article ends with seven recommendations that should make the intersection of WEs and LT more productive. Adapted from the source document
关键词: applied linguistics, language testing and assessment, Language Tests, Language Variation, Language Varieties, New Englishes, English, English as an International Language, Language Diversity, English as a Second Language Tests
- Brunfaut, Tineke. (2014). Interview: A Lifetime of Language Testing: An Interview with J. Charles Alderson. Language Assessment Quarterly, 11, 103-109.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Professor J. Charles Alderson is interviewed. Adapted from the source document
关键词: applied linguistics, language testing and assessment, Interviews, Linguists, Teachers, Language Tests
- Park, Kwanghyun. (2014). Corpora and Language Assessment: The State of the Art. Language Assessment Quarterly, 11, 27-44.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This article outlines the current state of and recent developments in the use of corpora for language assessment and considers future directions with a special focus on computational methodology. Because corpora began to make inroads into language assessment in the 1990s, test developers have increasingly used them as a reference resource to become well versed in terms of the linguistic characteristics of expert and novice speakers' usage and identify the test construct. In regard to developing and validating language tests, large representative corpora, learner corpora, and specialized corpora have been actively used, as these corpora have made it possible to systematically compare the linguistic features associated with expert users with those found in learner language. Recent advances in computational approaches to assessment can facilitate this comparison to a great extent using technologies in automated essay scoring and learner language analysis. As an emerging area in the field of language assessment, corpus-based research should extend to less explored areas including compilation and longitudinal analysis of developmental corpora, fine-grained microanalysis of learner's development, and assessment attuned to individual learners who use different linguistic varieties. Adapted from the source document
关键词: applied linguistics, language testing and assessment, Corpus Linguistics, Language Proficiency, Experts versus Novices, Language Acquisition, Language Tests
- Zhang, Limei, Goh, Christine C M, Kunnan, Antony John. (2014). Analysis of Test Takers' Metacognitive and Cognitive Strategy Use and EFL Reading Test Performance: A Multi-Sample SEM Approach. Language Assessment Quarterly, 11, 76-102.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This study investigates the relationships between test takers' metacognitive and cognitive strategy use through a questionnaire and their test performance on an English as a Foreign Language reading test. A total of 593 Chinese college test takers responded to a 38-item metacognitive and cognitive strategy questionnaire and a 50-item reading test. The data were randomly split into two samples (N = 296 and N = 297). Based on relevant literature, three models (i.e., unitary, higher order, and correlated) of strategy use and test performance were hypothesized and tested to identify the baseline model. Further, cross-validation analyses were conducted. The results supported the invariance of factor loadings, measurement error variances, structural regression coefficients, and factor variances for the unitary model. It was found that college test takers' strategy use affected their lexico-grammatical reading ability significantly. Findings from this study provide empirical and validating evidence for Bachman and Palmer's (2010) model of strategic competence. Adapted from the source document
关键词: applied linguistics, language testing and assessment, Metacognition, Language Tests, College Students, Cognitive Processes, English as a Second Language Tests
- Dronjic, V. & Helms-Park, R. (2014). Fixed-choice word-association tasks as second-language lexical tests: What native-speaker performance reveals about their potential weaknesses. Applied Psycholinguistics, 35(1), 193-221.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Qian and Schedl's Depth of Vocabulary Knowledge Test was administered to 31 native-speaker undergraduates under an 'unconstrained' condition, in which the number of responses to headwords was unfixed, whereas a corresponding group (n = 36) completed the test under the original 'constrained' condition. Results revealed lower accuracy in the unconstrained condition and in paradigmatic versus syntagmatic responses. Native speakers failed to reach the 90% criterion on most unconstrained and many constrained items. Although certain modifications could improve such a test (e.g., eliminating psycholinguistically anomalous headwords, such as adjectives, or presenting responses to headwords discontinuously), two intransigent problems impede test validity. First, collocates in the mental lexicon differ in tightness and vary across dialects, sociolects, and age groups. Second, it is more serious that second-language Depth of Vocabulary Knowledge Tests are likely spot checks of metalinguistic knowledge rather than depth tests that reflect what learners would actually produce in spontaneous utterances. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Mental Lexicon, Vocabulary Size, Language Tests
- Delcenserie, A., Genesee, F., & Gauthier, K. (2013). Language abilities of internationally adopted children from China during the early school years: Evidence for early age effects. Applied Psycholinguistics, 34(3), 541-568.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:We assessed the language, cognitive, and socioemotional abilities of 27 internationally adopted children from China, adopted by French-speaking parents, 12 of whom had been assessed previously by Gauthier and Genesee. The children were on average 7 years, 10 months old and were matched to nonadopted monolingual French-speaking children on age, gender, and socioeconomic status. Although there were no significant differences between the groups with respect to socioemotional and cognitive development, the adoptees scored significantly lower than the controls on measures of receptive grammar, expressive vocabulary, word definitions, and sentence recall, findings that were similar to those reported by Gauthier and Genesee. Analyses of correlations between the adopted children's language test results and their age at adoption, length of exposure to the adoption language, health, and other developmental problems revealed relatively few significant associations. In contrast, analyses of the relationship between their language test scores and their performance on the recalling sentences subtest suggest a link between performance on these two tests. We speculate on the role that performance on sentence recall might play in mediating differences in language outcomes between the two groups of children.
关键词:psycholinguistics, child language acquisition, Children, Language Tests, Chinese, French as a Second Language Learning, Language Acquisition, English as a Second Language Learning, Age Effects, Cognitive Development
- O'Sullivan, Barry. (2012). Assessment issues in languages for specific purposes. The Modern Language Journal, 96(Supplement 1), 71-88.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:While Grosse and Voght (1991) set out a well-considered overview of LSP and identified areas in need of development, they limited their observations on the topic of assessment to a short section devoted to what they called the proficiency movement. While it is true that they really did not have a lot to report on at the time they wrote their review, and that this current review highlights much interesting work done in the area in the intervening years, there remains a considerable emphasis on practice rather than on theory, with published papers attempting to identify solutions to given assessment problems, rather than attempting to build cohesive assessment theories. The primary focus of this contribution is on the latter aspect of language assessment. The article first offers a historical overview of the area over the past twenty years, which moves from a brief discussion of the issues highlighted by Grosse and Voght to the theoretical issues that have emerged, and finishing with a critical review of research on issues around assessment in three specific domains (immigration and citizenship, or work and the professions). The article then highlights current needs and priorities, focusing on issues of test usage and introducing the concept of test localization before presenting the core of the argument: a theory of LSP assessment validation. It concludes with a preliminary attempt to exemplify how a theory-driven research agenda can inform future research, and ultimately, practice in the area of LSP assessment over the coming decades. Adapted from the source document
关键词:applied linguistics, language for special purposes, Language for Special Purposes, Language Tests
- Elder, C., Barber, M., Staples, M., Osborne, R. H., Clerehan, R. & Buchbinder, R. (2012). Assessing health literacy: A new domain for collaboration between language testers and health professionals. Language Assessment Quarterly, 9, 205-224.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Health literacy, defined as an individual's capacity to process health information in order to make appropriate health decisions, is the focus of increasing attention in medical fields due to growing awareness that suboptimal health literacy is associated with poorer health outcomes. To explore this issue, a number of instruments, reported to have high internal consistency and strong correlations with general literacy tests, have been developed. However, their validity as measures of the target construct is seldom explored using multiple sources of evidence. The current study, involving collaboration between health professionals and language specialists, set out to assess the validity of the Rapid Estimate of Adult Literacy in Medicine (REALM), which describes itself as a 'reading recognition' test that measures ability to pronounce common medical and lay terms. Drawing on a sample of 310 respondents, including both native and non-native speakers of English, investigations were undertaken to probe the REALM's validity as a measure of understanding the selected terms and to consider associations between scores on this widely used test and those derived from other recognized health literacy tests. Results suggest that the REALM is underrepresenting the health literacy construct and that the test may also be biased against non-native speakers of English. The study points to an expanded role for language testers, working in collaboration with experts from medical disciplines, in developing and evaluating health literacy tools. Adapted from the source document
关键词:applied linguistics, language testing and assessment, applied linguistics, adult language development/literacy studies, Adult Literacy, Health Care Practitioners, Language Tests, Test Validity and Reliability, Reading Tests
- Fulcher, Glenn. (2012). Assessment literacy for the language classroom. Language Assessment Quarterly, 9, 113-132.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Language testing has seen unprecedented expansion during the first part of the 21st century. As a result there is an increasing need for the language testing profession to consider more precisely what it means by 'assessment literacy' and to articulate its role in the creation of new pedagogic materials and programs in language testing and assessment to meet the changing needs of teachers and other stakeholders for a new age. This article describes a research project in which a survey instrument was developed, piloted, and delivered on the Internet to elicit the assessment training needs of language teachers. The results were used to inform the design of new teaching materials and the further development of online resources that could be used to support program delivery. The project makes two significant contributions. First, it provides new empirically derived content for the concept of assessment literacy within which to frame materials development and teaching. Second, it uncovered methodological problems with existing survey techniques that may have impacted upon earlier studies, and solutions to these problems are suggested. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Language Tests, Surveys, Teachers, Occupations, Language Teaching Methods, Teacher Education
- Bowles, M. A. (2011). Measuring implicit and explicit linguistic knowledge: What can heritage language learners contribute?. Studies in Second Language Acquisition, 33, 247-271.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Although claims about explicit and implicit language knowledge are central to many debates in SLA, little research has been dedicated to measuring the two knowledge types (R. Ellis, 2004, 2005). The purpose of this study was to validate the use of the battery of tests reported in Ellis (2005) to measure implicit and explicit language knowledge. Whereas Ellis (2005) tested only second-language (L2) learners (of English), this study tested both L2 and heritage language (HL) learners (of Spanish). Results showed that test scores loaded on a two-factor model, as in Ellis (2005), thereby providing construct validity for the tests, on a population of HL learners who have little explicit knowledge by virtue of the environment in which they acquired Spanish. Adapted from the source document
关键词:applied linguistics, non-native language learning languages other than English, Second Language Learning, Heritage Language, Linguistic Competence, Learning Environment, Knowledge, Language Tests, English as a Second Language Learning, Spanish as a Second Language Learning
- McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555-576.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:This paper examines the uptake of Rasch measurement in Language Testing through a consideration of research published in Language Testing research journals in the period 1984 to 2009. Following the publication of the first papers on this topic, exploring the potential of the simple Rasch model for the analysis of dichotomous language test data, a debate ensued as to the assumptions of the theory, and the place of the model both within Item Response Theory (IRT) more generally and as appropriate for the analysis of language test data in particular. It seemed for some time that the reservations expressed about the use of the Rasch model might prevail. Gradually, however, the relevance of the analyses made possible by multi-faceted Rasch measurement to address validity issues within performance-based communicative language assessments overcame Language Testing researchers' initial resistance. The paper outlines three periods in the uptake of Rasch measurement in the field, and discusses the research which characterized each period. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, Test Validity and Reliability
- Pellicer-Sanchez, A., & Schmitt, N. (2012). Scoring yes-no vocabulary tests: Reaction time vs. nonword approaches. Language Testing, 29(4), 489-509.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Despite a number of research studies investigating the Yes-No vocabulary test format, one main question remains unanswered: What is the best scoring procedure to adjust for testee overestimation of vocabulary knowledge? Different scoring methodologies have been proposed based on the inclusion and selection of nonwords in the test. However, there is currently no consensus on the best adjustment procedure using these nonwords. Two studies were conducted to examine a new methodology for scoring Yes-No tests based on testees' response times (RTs) to the words in the test, on the assumption that faster responses would be more certain and accurate whereas more hesitant and inaccurate ones would be reflected in slower RTs. Participants performed a timed Yes-No test and were then interviewed to ascertain their actual vocabulary knowledge. Study 1 explored the viability of this approach and Study 2 examined whether the RT approach presented any advantage over the more traditional nonword approaches. Results showed that there was no clear advantage for any of the approaches under comparison, but their effectiveness depended on factors like the false alarm rate and the size of participants' overestimation of their lexical knowledge. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Nonsense Words, Response Time Psychology, Vocabulary, Language Tests
- Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers' oral performances. Language Testing, 29(3), 345-369.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Performance on integrated tasks requires candidates to engage skills and strategies beyond language proficiency alone, in ways that can be difficult to define and measure for testing purposes. While it has been widely recognized that stimulus materials impact test performance, our understanding of the way in which test takers make use of these materials in their responses, particularly in the context of listening-speaking tasks, remains predominantly intuitive. Recent studies have highlighted the problems associated with content-related aspects of task fulfilment on integrated tasks, but little attempt has been made to operationalize the way in which content from the input material is integrated into speaking performances. Using discourse data from a trial administration of a pilot for an Oxford English language test, this paper investigates how test takers integrate stimulus materials into their speaking performances on an integrated listening-then-speaking summary task, whether these behaviours are reflected in the relevant rating scale and, by implication, whether the test scores assigned according to this scale reflect real differences in the quality of oral performances. An innovative discourse analytic approach was developed to analyse content-related aspects of performance in order to determine if such aspects represent an appropriate measure of the speaking ability construct. Results showed that the measures devised, such as the number of key points included from the input text, and the accuracy with which information was reproduced or reformulated, effectively distinguished participants according to their level of speaking proficiency. The study's findings support the use of this particular task-type and the appropriateness of the associated rating scale as a measure of speaking proficiency, as well as the utility of the devised discourse-based measures for the validation of integrated tasks in other assessment contexts. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, Discourse Analysis, English Proficiency, Test Validity and Reliability, Speech Production, Oral Language
- Haug, T. (2012). Methodological and theoretical issues in the adaptation of sign language tests: An example from the adaptation of a test to German Sign Language. Language Testing, 29(2), 181-201.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Despite the current need for reliable and valid test instruments in different countries in order to monitor the sign language acquisition of deaf children, very few tests are commercially available that offer strong evidence for their psychometric properties. This mirrors the current state of affairs for many sign languages, where very little research is available. No previous empirical study has focused explicitly on the linguistic, methodological, and theoretical issues involved in the process of adapting a test from a source sign language to a target sign language. Problems during the adaptation process can arise from linguistic differences between the source and the target language and differences in the source and the target cultures. Both are important aspects that need to be considered in the adaptation of a sign language test from a source to a target language. This study proposes a model for sign language test adaptation, based on the adaptation of the British Sign Language Receptive Skills Test to German Sign Language. The model includes different methodological steps, with a particular focus on construct validation. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Sign Language, Language Tests, German, Test Validity and Reliability
- Sato, T. (2012). The contribution of test-takers' speech content to scores on an English oral proficiency test. Language Testing, 29(2), 223-241.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:The content that test-takers attempt to convey is not always included in the construct definition of general English oral proficiency tests, although some English-for-academic-purposes (EAP) speaking tests and most writing tests tend to place great emphasis on the evaluation of the content or ideas in the performance. This study investigated the relative contribution of linguistic criteria and the elaboration of speech content to scores on a test of speaking proficiency. A speaking test was designed and administered to Japanese undergraduates to determine what criteria English teachers associate with general oral proficiency. Nine raters were recruited to rate 30 students' monologues on three topics, using intuitive judgments of oral proficiency (referred to as Overall communicative effectiveness). Following this, they assigned scores to the monologues using five criteria: Grammatical accuracy, Fluency, Vocabulary range, Pronunciation, and Content elaboration/development. The raters were also asked to provide open-ended written comments on the factors contributing to their intuitive judgments. Statistical analyses of the scores -- Rasch measurement, multiple regression, and multivariate generalizability (G) theory analysis -- revealed that Content elaboration/development made a substantive contribution to the intuitive judgments and composite score. The present study enriches our understanding of general oral proficiency and the construct definition of proficiency tests. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Oral Language, English as a Second Language Tests, English Proficiency, Language Tests
- Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple.... Language Testing, 29(1), 19-27.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:In this commentary, Chapelle responds to Michael Kane's (same journal issue) article "Validating score interpretations and uses: Messik Lecture, Language Testing Research Colloquium, Cambridge, April 2010." Chapelle elaborates on some issues that Kane's approach raises for language testing based on her experiences with interpretive arguments and validity arguments. Adapted from the source document
关键词:applied linguistics, language testing and assessment, Language Tests, Test Validity and Reliability
- Davies, A. (2012). Kane, validity and soundness. Language Testing, 29(1), 37-42.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:In this commentary, Davies responds to Michael Kane's (same journal issue) article "Validating score interpretations and uses: Messik Lecture, Language Testing Research Colloquium, Cambridge, April 2010." Adapted from the source document
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Language Tests
- Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3-17.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:The argument-based approach to validation involves two steps; specification of the proposed interpretations and uses of the test scores as an interpretive argument, and the evaluation of the plausibility of the proposed interpretive argument. More ambitious interpretations and uses tend to involve an extended network of inferences and assumptions and require extensive evidence for their support. Simpler interpretations do not claim much, and therefore, may not require much evidential support. The evaluation of score based decisions generally requires an evaluation of the consequences of the decision rule. In any case, the claims that are being made need to be justified. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Test Validity and Reliability, Language Tests
- Oller, J. W. (2012). Grounding the argument-based framework for validating score interpretations and uses. Language Testing, 29(1), 29-36.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Kane's argument-based framework is summarized and examined. He implicitly appeals to the backgrounded concepts of fairness and justice. From there it is a short distance to grounding the whole system in the mundane notion of truth. In fact, valid argument systems must depend on representations that are 'true' by virtue of agreement with purported facts. As a friendly amendment, therefore, I argue that (provided the ceteris paribus, all else being equal, requirement is met) agreement with known facts in testing, experimental research, and scientific measurement counts for a great deal more than disagreement. It follows by Peircean 'exact logic' that higher test scores (if the tests have any validity at all) are invariably more informative (interpretable in general) and thus more useful than lower scores. Why? Because higher scores show more agreement between the test-makers and the higher scoring test-takers about whatever facts (or performances) may be at issue. Exceptions are cases where the ceteris paribus requirement is not met. Necessary (but testable) inferences follow for interpretations and uses of 'cutscores.' [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, Test Validity and Reliability
- Bunch, M. B. (2011). Testing English language learners under No Child Left Behind. Language Testing, 28(3), 323-341.
[ 详情
摘要
关键词
收藏
取消收藏
]
摘要:Title III of Public Law 107-110 (No Child Left Behind; NCLB) provided for creation of assessments of English language learners (ELLs) and established, through the Enhanced Assessment Grant program, a platform from which four consortia of states developed ELL tests aligned to rigorous statewide content standards. Those four tests (ACCESS for ELLs, CELLA, ELDA, and MWA) are now in use in one or more states, along with a host of other commercially available or locally developed tests. The tests (those developed by consortia as well as the others) are quite similar in many ways, principally in their contents: Listening, Speaking, Reading, and Writing. Most measure these domains with a combination of multiple-choice (MC) and open-ended (OE) test items. This article provides an overview to the four consortium-developed tests as well as an in-depth analysis of one representative example. It also provides a summary of the characteristics of four commercially available tests. Not surprisingly, the four commercially available tests are rather similar to one another and to the consortium-developed tests in terms of content, psychometric characteristics, and development. The primary difference between the two sets is that the commercially available tests tend to report percentile ranks as well as proficiency levels. Now that the Race to the Top program is in place, we face many of the same challenges we faced a decade ago when NCLB was passed. While the Enhanced Assessment Grant competition emphasized summative assessment, the latest competition emphasizes formative assessment, which gives rise to the hope that educators can not only discover students' strengths and weaknesses with these new tests, but do so in a timely manner and have opportunities to use the information constructively. Current work by at least one organization is encouraging in this regard. [Reprinted by permission of Sage Publications, Ltd., copyright holder.]
关键词:applied linguistics, language testing and assessment, Language Tests, English Proficiency, English as a Second Language, Achievement Tests, Educational Policy